#python #neural search #Jina #deep learning #clip #images

Image Encoders Part Trois: In which CLIP discovers the transitive property

Published Sep 22, 2021 by Alex C-G

Looking back on my last post about image encoders, I realize I should’ve based my judgement on more than one test. So I’m going to throw a bunch of images at CLIP that are not memes, but that include subjects that are memes. For example, a random non-meme picture of Kermit the Frog, since Kermit is in many memes.

Just for context, this is yet another in the series of posts where I talk about building a [meme search engine with Jina], an open-source framework that let’s you build a search engine for any kind of data. Spoiler alert: I work for Jina, and I’m sure my boss will be overjoyed I’m using state-of-the-art technology for something as time-wasting as searching memes. Still, I’m not the kind of person who hand-codes the whole damn thing, so there’s that. (Shout-out to deepfates!)

So here we go yet again. You’d think I’d be tired of this by now. You’d be right. That’s why I’m just testing CLIP now. I tried them on BigTransfer as well but the results were pretty much just random. If you want to see what the results for BigTransfer would look like just print out a bunch of memes, chuck them down down the stairs and take a look. It was like that for pretty much every meme here except “Sparta” (which it surprisingly does quite well on relative to CLIP in all tests).

Towards the end of this, either I started going mad or CLIP did. Scroll on to “The Bloody Unexpected” for more on that.

The Good

Solid matches. CLIP detected features and matched them well.

Query image	Top 3 results	What I expected

The Bad

No close matches whatsoever. Totally failed on all fronts. I mean I guess the “red button” thing did kinda work in that it returned images of another button. But where it got that top image I have no clue.

		WTF, I'm sure I saw a meme of a stick figure with a blue bobble hat. I'm losing it ffs

The Bloody Unexpected

So assuming the transitive property, “Captain Picard” == “Doctor Evil” == “Kermit the Frog” == “Yoda”. I’m baffled by this. I ran the tests several times to make sure I wasn’t saving the output images from a previous search under the wrong name. But nope. This is really what came out. CLIP is discovering whole new connections in its vector index, and possibly forming some warped meaning of life.

Query image	Top 3 results	What I expected

What have we learned today, children?

Christ knows. My head hurts. If CLIP had a head it might hurt too. I think looking at too many memes has damaged both of us. Quick conclusion: CLIP is good, bad, and downright confusing. Note it down in your copybooks. I need a beer and a re-evaluation of my life choices.

*****